Systems and Methods for Area Efficient Data Encoding

ABSTRACT

The present inventions are related to systems and methods for data processing, and more particularly to systems and methods for data encoding.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims priority to Russian Patent App. No.2014104571 entitled “Systems and Methods for Area Efficient DataEncoding”, and filed Feb. 10, 2014 by Panteleev et al. The entirety ofthe aforementioned patent application is incorporated herein byreference for all purposes.

FIELD OF THE INVENTION

The present inventions are related to systems and methods for dataprocessing, and more particularly to systems and methods for dataencoding.

BACKGROUND

Various data transfer systems have been developed including storagesystems, cellular telephone systems, and radio transmission systems. Ineach of the systems data is transferred from a sender to a receiver viasome medium. For example, in a storage system, data is sent from asender (i.e., a write function) to a receiver (i.e., a read function)via a storage medium. Encoding may involve vector multiplication by aquasi-cyclic matrices. Such vector multiplication is complex both interms of circuit design and the area required to implement the circuits.Such significant area requirements increase the costs of encodingdevices.

Hence, for at least the aforementioned reasons, there exists a need inthe art for advanced systems and methods for data processing.

SUMMARY

The present inventions are related to systems and methods for dataprocessing, and more particularly to systems and methods for dataencoding.

Various embodiments of the present invention provide data processingsystems that include an encoder circuit. The encoder circuit includes acyclic convolution circuit and an encoded output circuit. The cyclicconvolution circuit is operable to multiply a vector input derived froma user data input by a portion of a circulant matrix to yield aconvolved output. The encoded output circuit is operable to generate anencoded data set corresponding to the user data input and based at leastin part on the convolved output.

This summary provides only a general outline of some embodiments of theinvention. The phrases “in one embodiment,” “according to oneembodiment,” “in various embodiments”, “in one or more embodiments”, “inparticular embodiments” and the like generally mean the particularfeature, structure, or characteristic following the phrase is includedin at least one embodiment of the present invention, and may be includedin more than one embodiment of the present invention. Importantly, suchphases do not necessarily refer to the same embodiment. Many otherembodiments of the invention will become more fully apparent from thefollowing detailed description, the appended claims and the accompanyingdrawings.

BRIEF DESCRIPTION OF THE FIGURES

A further understanding of the various embodiments of the presentinvention may be realized by reference to the figures which aredescribed in remaining portions of the specification. In the figures,like reference numerals are used throughout several figures to refer tosimilar components. In some instances, a sub-label consisting of a lowercase letter is associated with a reference numeral to denote one ofmultiple similar components. When reference is made to a referencenumeral without specification to an existing sub-label, it is intendedto refer to all such multiple similar components.

FIG. 1 shows a storage system having area efficient LDPC encodercircuitry in accordance with various embodiments of the presentinvention;

FIG. 2 shows a data transmission device including a transmitter havingarea efficient LDPC encoder circuitry in accordance with variousembodiments of the present invention;

FIG. 3 shows a solid state memory circuit including a data processingcircuit having area efficient LDPC encoder circuitry in accordance withsome embodiments of the present invention;

FIG. 4 a shows a processing system including an area efficient LDPCencoder circuit in accordance with some embodiments of the presentinvention;

FIG. 4 b shows one implementation of an area efficient quasi-cyclicmatrix multiplication circuit relying on a number of cyclic convolutionsthat may be used to implement the area efficient encoder circuit of FIG.4 a;

FIG. 4 c depicts an cyclic convolution circuit that may be used toimplement the area efficient quasi-cyclic matrix multiplication circuitof FIG. 4 b;

FIG. 5 a shows another implementation an area efficient quasi-cyclicmatrix multiplication circuit relying on a number of cyclic convolutionsthat may be used to implement the area efficient encoder circuit of FIG.4 a; and

FIG. 5 b depicts one implementation of a parallel cyclic convolutioncircuit that may be used to implement the parallel cyclic convolutioncircuit of FIG. 5 a.

DETAILED DESCRIPTION OF SOME EMBODIMENTS

The present inventions are related to systems and methods for dataprocessing, and more particularly to systems and methods for dataencoding.

Various embodiments of the present invention provide data processingsystems that include an encoder circuit. The encoder circuit includesone or more area efficient quasi-cyclic matrix multiplicationcircuit(s). Such quasi-cyclic matrix multiplication circuit(s) aredesigned as a number of cyclic convolutions. Using such an approach, itis possible to implement a encoder circuit for quasi-cyclic low densityparity check (LDPC) codes that is smaller and offering several timeshigher throughput compared with an encoder circuit relying exclusivelyon shift registers and/or barrel shifters to perform quasi-cyclic matrixmultiplications. In some cases, the quasi-cyclic matrix multiplicationcircuit(s) designed as a number of cyclic convolutions may use acombination of Winograd and Agarwal-Cooley fast convolution algorithms,though many other fast convolution algorithms can be used as well. SuchWinograd and Agarwal-Cooley algorithms are discussed in detail inRichard E. Blahut, “Fast Algorithms for Digital Signal Processing,”Addison-Wesley, Reading, MA 1985. The entirety of the aforementionedreference is incorporated herein by reference for all purposes.

Most encoding algorithms for quasi-cyclic LDPC codes can be roughlydivided into two main categories: generator matrix based (G-based) andparity-check matrix based (H-based). In a G-based encoder a systematicquasi-cyclic generator matrix G=(I|Gp) is used, where Gp is aquasi-cyclic matrix, which is usually dense. The parity bits vector p isobtained by formula p=uGp, where u is a user bits vector. In an H-basedencoder we usually represent a quasi-cyclic parity-check matrix of thecode as H=(Hu|Hp), where Hu, Hp are its quasi-cyclic sub-matricescorresponding to the user and parity parts of the codeword.Subsequently, the vector s^(T)=H_(u)u^(T) is calculated, and basedthereon the parity vector p is determined as a solution of the equationH_(p)p^(T)=s^(T). As it can be seen from the above description bothcategories of encoders involve a vector by a quasi-cyclic matrixmultiplication step. As such, embodiments of the present inventionoffering improved quasi-cyclic multiplication circuits offer improvedencoding.

Various embodiments of the present invention provide data processingsystems that include an encoder circuit. The encoder circuit includes acyclic convolution circuit and an encoded output circuit. The cyclicconvolution circuit is operable to multiply a vector input derived froma user data input by a portion of a circulant matrix to yield aconvolved output. The encoded output circuit is operable to generate anencoded data set corresponding to the user data input and based at leastin part on the convolved output. In some cases, the data processingsystem is implemented as part of a storage device, or a communicationdevice. In various cases, the data processing system is implemented aspart of an integrated circuit.

In some instances of the aforementioned embodiments, the encoded outputcircuit includes: a vector adder circuit operable to sum instances ofthe convolved output with instances of a cyclic convolution output toyield a corresponding instance of a vector sum, and a shift registercircuit operable to shift instances of the vector sum to yield theinstances of the cyclic convolution output. In some cases, the encodeddata set generated based at least in part on the cyclic convolutionoutput. In various cases, the number of instances of the vector sum isl, where l corresponds to the number of sub-vectors into which the userdata input is divided.

In various instances of the aforementioned embodiments, the cyclicconvolution circuit includes: a first cyclic convolution circuit and asecond cyclic convolution circuit. In such instances, the first cyclicconvolution circuit operates in parallel with the second cyclicconvolution circuit, and the first cyclic convolution circuit operateson a first portion of the vector input and the second cyclic convolutioncircuit operates on a second portion of the vector input. In some cases,the first portion of the vector input is a 3'1 portion of the vectorinput, and wherein the second portion of the vector input is a 3×4portion of the vector input. In other cases, the first portion of thevector input is a 3×4 portion of the vector input, and wherein thesecond portion of the vector input is a 3×8 portion of the vector input.

In one or more instances of the aforementioned embodiments, the systemsfurther include a transformation circuit operable to transform a firstnumber of bits of the user data input into a second number of bits ofthe vector input. In some such instances, the first number of bits is128, and the second number of bits is 255. In various such instances,the cyclic convolution circuit includes: a first cyclic convolutioncircuit, a second cyclic convolution circuit, and a combining circuit.In such instances, the first cyclic convolution circuit operates inparallel with the second cyclic convolution circuit, and the firstcyclic convolution circuit operates on a first portion of the vectorinput and the second cyclic convolution circuit operates on a secondportion of the vector input. The combining circuit is operable tocombine at least the first sub-output and the second sub-output to yielda non-transformed output. In some cases, the system further includes aninverse transformation circuit operable transform the second number ofbits of the non-transformed output to the first number of bits of acyclic convolution output.

Other embodiments of the present invention provide methods for dataencoding that include: receiving a user data input; using a cyclicconvolution circuit to multiply a vector input derived from a user datainput by a portion of a circulant matrix to yield a convolved output;and generating an encoded data set corresponding to the user data inputand based at least in part on the convolved output. In some instances ofthe aforementioned embodiments, the methods further include transforminga first number of bits of the user data input into a second number ofbits to yield the vector input. In some cases, the first number of bitsis 128, and the second number of bits is 255.

In one or more instances of the aforementioned embodiments, the cyclicconvolution circuit includes: a first cyclic convolution circuit and asecond cyclic convolution circuit. The first cyclic convolution circuitoperates in parallel with the second cyclic convolution circuit. Thefirst cyclic convolution circuit operates on a first portion of thevector input and the second cyclic convolution circuit operates on asecond portion of the vector input. In some cases, the methods furtherinclude: adding instances of the convolved output with instances of acyclic convolution output to yield a corresponding instance of a vectorsum; and shifting instances of the vector sum to yield the instances ofthe cyclic convolution output.

Turning to FIG. 1, a storage system 100 is shown that includes a readchannel 110 having area efficient LDPC encoder circuitry in accordancewith one or more embodiments of the present invention. Storage system100 may be, for example, a hard disk drive. Storage system 100 alsoincludes a preamplifier 170, an interface controller 120, a hard diskcontroller 166, a motor controller 168, a spindle motor 172, a diskplatter 178, and a read/write head 176. Interface controller 120controls addressing and timing of data to/from disk platter 178, andinteracts with a host controller (not shown). The data on disk platter178 consists of groups of magnetic signals that may be detected byread/write head assembly 176 when the assembly is properly positionedover disk platter 178. In one embodiment, disk platter 178 includesmagnetic signals recorded in accordance with either a longitudinal or aperpendicular recording scheme.

In a typical read operation, read/write head 176 is accuratelypositioned by motor controller 168 over a desired data track on diskplatter 178. Motor controller 168 both positions read/write head 176 inrelation to disk platter 178 and drives spindle motor 172 by movingread/write head assembly 176 to the proper data track on disk platter178 under the direction of hard disk controller 166. Spindle motor 172spins disk platter 178 at a determined spin rate (RPMs). Once read/writehead 176 is positioned adjacent the proper data track, magnetic signalsrepresenting data on disk platter 178 are sensed by read/write head 176as disk platter 178 is rotated by spindle motor 172. The sensed magneticsignals are provided as a continuous, minute analog signalrepresentative of the magnetic data on disk platter 178. This minuteanalog signal is transferred from read/write head 176 to read channelcircuit 110 via preamplifier 170. Preamplifier 170 is operable toamplify the minute analog signals accessed from disk platter 178. Inturn, read channel circuit 110 decodes and digitizes the received analogsignal to recreate the information originally written to disk platter178. This data is provided as read data 103 to a receiving circuit. Awrite operation is substantially the opposite of the preceding readoperation with write data 101 being provided to read channel circuit110. This data is then encoded and written to disk platter 178.

In operation, data stored to disk platter 178 is encoded using an areaefficient encoder circuit to yield an encoded data set. The encoded dataset is then written to disk platter 178, and later accessed from diskplatter and decoded using a decoder circuit. In some cases, the areaefficient encoder circuit may be implemented to include quasi-cyclicmatrix multiplication circuit(s) designed as a number of cyclicconvolutions such as that discussed below in relation to FIGS. 4 b-4 c.In particular cases, the area efficient encoder circuit may beimplemented to include quasi-cyclic matrix multiplication circuit(s)that are designed to use a combination of Winograd and Agarwal-Cooleyfast convolution algorithms such as one described below in relation toFIGS. 5 a-5 b. The area efficient encoder circuit may operate similar tothat discussed below in relation to FIG. 6.

It should be noted that storage system 100 may be integrated into alarger storage system such as, for example, a RAID (redundant array ofinexpensive disks or redundant array of independent disks) based storagesystem. Such a RAID storage system increases stability and reliabilitythrough redundancy, combining multiple disks as a logical unit. Data maybe spread across a number of disks included in the RAID storage systemaccording to a variety of algorithms and accessed by an operating systemas if it were a single disk. For example, data may be mirrored tomultiple disks in the RAID storage system, or may be sliced anddistributed across multiple disks in a number of techniques. If a smallnumber of disks in the RAID storage system fail or become unavailable,error correction techniques may be used to recreate the missing databased on the remaining portions of the data from the other disks in theRAID storage system. The disks in the RAID storage system may be, butare not limited to, individual storage systems such as storage system100, and may be located in close proximity to each other or distributedmore widely for increased security. In a write operation, write data isprovided to a controller, which stores the write data across the disks,for example by mirroring or by striping the write data. In a readoperation, the controller retrieves the data from the disks. Thecontroller then yields the resulting read data as if the RAID storagesystem were a single disk.

A data decoder circuit used in relation to read channel circuit 110 maybe, but is not limited to, a low density parity check (LDPC) decodercircuit as are known in the art. Such low density parity checktechnology is applicable to transmission of information over virtuallyany channel or storage of information on virtually any media.Transmission applications include, but are not limited to, opticalfiber, radio frequency channels, wired or wireless local area networks,digital subscriber line technologies, wireless cellular, Ethernet overany medium such as copper or optical fiber, cable channels such as cabletelevision, and Earth-satellite communications. Storage applicationsinclude, but are not limited to, hard disk drives, compact disks,digital video disks, magnetic tapes and memory devices such as DRAM,NAND flash, NOR flash, other non-volatile memories and solid statedrives.

In addition, it should be noted that storage system 100 may be modifiedto include solid state memory that is used to store data in addition tothe storage offered by disk platter 178. This solid state memory may beused in parallel to disk platter 178 to provide additional storage. Insuch a case, the solid state memory receives and provides informationdirectly to read channel circuit 110. Alternatively, the solid statememory may be used as a cache where it offers faster access time thanthat offered by disk platted 178. In such a case, the solid state memorymay be disposed between interface controller 120 and read channelcircuit 110 where it operates as a pass through to disk platter 178 whenrequested data is not available in the solid state memory or when thesolid state memory does not have sufficient storage to hold a newlywritten data set. Based upon the disclosure provided herein, one ofordinary skill in the art will recognize a variety of storage systemsincluding both disk platter 178 and a solid state memory.

Turning to FIG. 2, a data transmission system 200 including atransmitter 210 having area efficient LDPC encoder circuitry inaccordance with one or more embodiments of the present invention.Transmitter 210 transmits encoded data via a transfer medium 230.Transfer medium 230 may be a wired or wireless transfer medium. Basedupon the disclosure provided herein, one of ordinary skill in the artwill recognize a variety of transfer mediums that may be used inrelation to different embodiments of the present invention. The encodeddata is received from transfer medium 230 by receiver 220. In operation,transmitter encodes user data using an area efficient encoder circuit toyield an encoded data set. In some cases, the area efficient encodercircuit may be implemented to include quasi-cyclic matrix multiplicationcircuit(s) designed as a number of cyclic convolutions such as thatdiscussed below in relation to FIGS. 4 b-4 c. In particular cases, thearea efficient encoder circuit may be implemented to includequasi-cyclic matrix multiplication circuit(s) that are designed to use acombination of Winograd and Agarwal-Cooley fast convolution algorithmssuch as one described below in relation to FIGS. 5 a-5 b. The areaefficient encoder circuit may operate similar to that discussed below inrelation to FIG. 6.

Turning to FIG. 3, another storage system 300 is shown that includes adata processing circuit 310 having area efficient LDPC encoder circuitryin accordance with one or more embodiments of the present invention. Ahost controller circuit 305 receives data to be stored (i.e., write data301). Solid state memory access controller circuit 340 may be anycircuit known in the art that is capable of controlling access to andfrom a solid state memory 350. Solid state memory access controllercircuit 340 encodes a received data set to yield an encoded data set.The encoding is done using an area efficient LDPC encoder circuit, andresults in an encoded data set that is stored to solid state memory 350.Solid state memory 350 may be any solid state memory known in the art.In some embodiments of the present invention, solid state memory 350 isa flash memory. In some cases, the area efficient encoder circuit may beimplemented to include quasi-cyclic matrix multiplication circuit(s)designed as a number of cyclic convolutions such as that discussed belowin relation to FIGS. 4 b-4 c. In particular cases, the area efficientencoder circuit may be implemented to include quasi-cyclic matrixmultiplication circuit(s) that are designed to use a combination ofWinograd and Agarwal-Cooley fast convolution algorithms such as onedescribed below in relation to FIGS. 5 a-5 b. The area efficient encodercircuit may operate similar to that discussed below in relation to FIG.6.

Turning to FIG. 4 a, a data processing system 400 is shown that includesan area efficient LDPC encoder circuit 420 in accordance with someembodiments of the present invention. Data processing system 400includes area efficient LDPC encoder circuit 420 that applies dataencoding algorithm using matrix multiplication implemented as a numberof cyclic convolutions. Area efficient LDPC encoder circuit 420 appliesthe encoding algorithm to an original data input 405 to yield an encodedoutput 439. Application of the encoding algorithm includes performing anumber of vector multiplications by quasi-cyclic matrices implemented asa number of cyclic convolutions. The vector multiplications byquasi-cyclic matrices may be implemented similar to that discussed belowin relation to FIGS. 4 b-4 c.

Encoded output 439 is provided to a transmission circuit 430 that isoperable to transmit the encoded data to a recipient via a medium 440.Transmission circuit 430 may be any circuit known in the art that iscapable of transferring encoded output 439 via medium 440. Thus, forexample, where data processing circuit 400 is part of a hard disk drive,transmission circuit 430 may include a read/write head assembly thatconverts an electrical signal into a series of magnetic signalsappropriate for writing to a storage medium. Alternatively, where dataprocessing circuit 400 is part of a wireless communication system,transmission circuit 430 may include a wireless transmitter thatconverts an electrical signal into a radio frequency signal appropriatefor transmission via a wireless transmission medium. Transmissioncircuit 430 provides a transmission output to medium 440. Medium 440provides a transmitted input that is the transmission output augmentedwith one or more errors introduced by the transference across medium440.

Of note, original data input 405 may be any data set that is to betransmitted. For example, where data processing system 400 is a harddisk drive, original data input 405 may be a data set that is destinedfor storage on a storage medium. In such cases, a medium 440 of dataprocessing system 400 is a storage medium. As another example, wheredata processing system 400 is a communication system, original datainput 405 may be a data set that is destined to be transferred to areceiver via a transfer medium. Such transfer mediums may be, but arenot limited to, wired or wireless transfer mediums. In such cases, amedium 440 of data processing system 400 is a transfer medium.

Data processing circuit 400 includes an analog processing circuit 450that applies one or more analog functions to the transmitted input. Suchanalog functions may include, but are not limited to, amplification andfiltering. Based upon the disclosure provided herein, one of ordinaryskill in the art will recognize a variety of pre-processing circuitrythat may be used in relation to different embodiments of the presentinvention. In addition, analog processing circuit 450 converts theprocessed signal into a series of corresponding digital samples. Dataprocessing circuitry 460 applies data detection and/or data decodingalgorithms to the series of digital samples to yield a data output 465.Based upon the disclosure provided herein, one of ordinary skill in theart will recognize a variety of data processing circuitry that may beused to recover original data input from the series of digital samples.

As background to understanding an area efficient quasi-cyclic matrixmultiplication circuit used to implement the area efficient encodercircuit 420, an l×l matrix over GF(q) is called a circulant if it hasthe following form:

$\begin{pmatrix}a_{0} & a_{l - 1} & \ldots & a_{1} \\a_{1} & a_{0} & \ldots & a_{2} \\\vdots & \vdots & \ddots & \vdots \\a_{l - 1} & a_{l - 2} & \ldots & a_{0}\end{pmatrix}.$

Such a circulant matrix can be uniquely represented by its first column(a₀, a₁, . . . , a_(l−1))^(T), and it can be seen that a vector can bere-written by a circulant matrix multiplication in the following way:

$\begin{bmatrix}c_{0} \\c_{1} \\\vdots \\c_{l - 1}\end{bmatrix} = {{\begin{pmatrix}a_{0} & a_{l - 1} & \ldots & a_{1} \\a_{1} & a_{0} & \ldots & a_{2} \\\vdots & \vdots & \ddots & \vdots \\a_{l - 1} & a_{l - 2} & \ldots & a_{0}\end{pmatrix}\begin{bmatrix}b_{0} \\b_{1} \\\vdots \\b_{l - 1}\end{bmatrix}}.}$

The aforementioned multiplication may be represented in the followingway:

$c_{i} = {\sum\limits_{j = 0}^{l - 1}\; {a_{j}{b_{{({i - j})}{mod}\; n}.}}}$

The vector C=(c₀, . . . , c_(l−1))^(T) is referred to herein as a cyclicconvolution of the vectors a=(a₀, . . . , a_(l−1))^(T) and b=b₀, . . . ,b_(l−1))^(T), and for simplicity is denoted as a*b.

A quasi-circulant matrix may be represented as follows:

${A = \begin{pmatrix}A_{11} & \ldots & A_{n\; 1} \\\vdots & \ddots & \vdots \\A_{m\; 1} & \ldots & A_{mn}\end{pmatrix}},$

where each block A_(ij), i=1 to m, j=1 to n, is an l×1 circulant matrixover a finite field GF(q). Using a column vector u=(u₁, . . . ,u_(n))^(T), where sub-vectors u₁, . . . , u_(n) are of length l,multiplying u by the aforementioned quasi-circulant matrix yields:

${\begin{bmatrix}v_{1} \\\vdots \\v_{m}\end{bmatrix} = {\begin{pmatrix}A_{11} & \ldots & A_{n\; 1} \\\vdots & \ddots & \vdots \\A_{m\; 1} & \ldots & A_{mn}\end{pmatrix}\begin{bmatrix}u_{1} \\\vdots \\u_{n}\end{bmatrix}}},$

where each sub-vector v_(i) of length l is given by the followingformula:

v _(i) =A _(i1) u ₁ + . . . +A _(in) u _(n); for i=1 to m.

Applying cyclic convolution, the preceding formula for each sub-vectorv_(i) of length l may be re-written as:

v _(i) =a _(i1) *u ₁ + . . . +a _(in) *u _(n); for i=1 to m.

where a_(ij) is the first column of the aforementioned circulant matrixA_(ij); for i=1 to m, and j=1 to n. Thus, quasi-cyclic multiplicationcan be obtained by performing m×n cyclic convolutions and m×(n−1) vectoradditions over GF(q).

Turning to FIG. 4 b, an implementation of an area efficient quasi-cyclicmatrix multiplication circuit 470 relying on a number of cyclicconvolutions is shown that may be used to implement the matrixmultiplication circuitry of area efficient encoder circuit 420 of FIG. 4a. Area efficient quasi-cyclic matrix multiplication circuit 470includes a read only memory circuit 475 pre-programmed to include thefirst columns of circulant matrices 478 (i.e., the aforementionedA_(ij)).

Original data input 405 (i.e., U_(j)) and the first columns of circulantmatrices 478 (i.e., a_(ij)) are provided to a cyclic convolution circuit485 that applies cyclic convolution to the received inputs to yield aconvolved output 482 (i.e., a_(ij)*u_(j)). Convolved output 482 isprovided to a vector addition circuit 490 that is operable to calculatethe sum of two vectors of length l over GF(q). In some embodiments ofthe present invention, vector addition circuit 490 is implemented usingXOR gates as is known in the art. In particular, vector addition circuit490 calculates the sum of convolved output 482 and an accumulated cyclicconvolution output 497 over a length l. A resulting vector sum 492 isstored to a shift register circuit 495 where it is shifted over thelength l with the final shift yielding the final value of cyclicconvolution output 497. Initially, all of the values in shift registercircuit 495 are zeros. The final value of cyclic convolution output 497may be represented by the following equation:

cyclic convolution output 497=a _(i1) *u ₁ + . . . +a _(in) *u _(n); fori=1 to m.

The approach used in area efficient quasi-cyclic matrix multiplicationcircuit 470 operates over m×n clock cycles plus the delay of cyclicconvolution circuit 485. Original data input 405) and the first columnsof circulant matrices 478 (a_(ij)) should be in the following order:

$\begin{matrix}u_{j} \\a_{ij}\end{matrix}\begin{matrix}u_{1} & u_{1} & \ldots & u_{1} & u_{2} & u_{2} & \ldots & u_{2} & u_{3} & \ldots \\a_{11} & a_{21} & \ldots & a_{m\; 1} & a_{12} & a_{22} & \ldots & a_{m\; 2} & a_{13} & \ldots\end{matrix}$

Turning to FIG. 4 c, one implementation of a cyclic convolution circuit900 for a length l of three that may be used to implement area efficientquasi-cyclic matrix multiplication circuit 470 of FIG. 4 b. As shown,cyclic convolution circuit 900 receives two vectors each of length three(i.e., ‘a’ and ‘b’). Vector ‘a’ includes a vector element 902 (a₀), avector element 904 (a₁), and a vector element 906 (a₂). Vector ‘b’includes a vector element 908 (b₀), a vector element 910 (b₁), and avector element 912 (b₂). Where cyclic convolution circuit 900 is used inrelation to area efficient quasi-cyclic matrix multiplication circuit470, vector ‘a’ corresponds to original data input 405 (i.e., u_(j)),and vector ‘b’ corresponds to the first columns of circulant matrices478 (i.e., a_(ij)).

Vector element 902 is provided to a multiplier circuit 922 where it ismultiplied by vector element 908 to yield a product 942; vector element902 is provided to a multiplier circuit 928 where it is multiplied byvector element 910 to yield a product 948; and vector element 902 isprovided to a multiplier circuit 938 where it is multiplied by vectorelement 912 to yield a product 958. Vector element 904 is provided to amultiplier circuit 924 where it is multiplied by vector element 912 toyield a product 944; vector element 904 is provided to a multipliercircuit 930 where it is multiplied by vector element 908 to yield aproduct 950; and vector element 904 is provided to a multiplier circuit936 where it is multiplied by vector element 910 to yield a product 956.Vector element 906 is provided to a multiplier circuit 926 where it ismultiplied by vector element 910 to yield a product 946; vector element906 is provided to a multiplier circuit 932 where it is multiplied byvector element 912 to yield a product 952; and vector element 906 isprovided to a multiplier circuit 934 where it is multiplied by vectorelement 908 to yield a product 954.

Product 942, product 944, and product 946 are provided to an addercircuit 962 where they are summed to yield a vector component 972 (c₀).Product 948, product 950, and product 952 are provided to an addercircuit 964 where they are summed to yield a vector component 974 (c₁).Product 954, product 956, and product 958 are provided to an addercircuit 966 where they are summed to yield a vector component 976 (c₂).

Where the length/of convolved output 482 is small, implementation ofarea efficient quasi-cyclic matrix multiplication circuit 470 usingblocks similar to that discussed in FIG. 4 c may be acceptable. However,where the length/of convolved output 482 becomes larger, cyclicconvolution circuit 485 may be implemented using one or more fast cyclicconvolution algorithms known in the art. Turning to FIG. 5 a, anotherimplementation an area efficient quasi-cyclic matrix multiplicationcircuit 500 is shown that relies on a number of cyclic convolutions thatmay be used to implement the area efficient encoder circuit 420 of FIG.4 a. Area efficient quasi-cyclic matrix multiplication circuit 500utilizes a parallel cyclic convolution circuit 540 implemented using acombination of Winograd and Agarwal-Cooley fast convolution algorithmsto operate on a binary field GF(2).

Area efficient quasi-cyclic matrix multiplication circuit 500 includes aregister circuit 510 that holds a number of bits of an original datainput 505 in parallel. In one embodiment of the present invention, thenumber of bits is one-hundred twenty-eight (128) bits. Based upon thedisclosure provided herein, one of ordinary skill in the art willrecognize other bit widths that may be used in relation to differentembodiments of the present invention. The registered data is accessed inparallel from register circuit 510 as a registered vector 515.Registered vector 515 is provided to a transformation circuit 520 wherethe number of bits in registered vector 515 are increased to yield atransformed vector 525. The operation of transformation circuit 520 ismore fully discussed below. In one embodiment of the present invention,the number of bits in transformed vector 525 is two-hundred fifty-five(255) bits. Based upon the disclosure provided herein, one of ordinaryskill in the art will recognize other bit widths that may be used inrelation to different embodiments of the present invention. Transformedvector 525 is stored to a register circuit 530 that provides theregistered data as a registered vector 535 (a′).

Similarly, area efficient quasi-cyclic matrix multiplication circuit 500includes a register circuit 511 that holds a number of bits of anoriginal data input 506 in parallel. In one embodiment of the presentinvention, the number of bits is one-hundred twenty-eight (128) bits.Based upon the disclosure provided herein, one of ordinary skill in theart will recognize other bit widths that may be used in relation todifferent embodiments of the present invention. The registered data isaccessed in parallel from register circuit 511 as a registered vector516. Registered vector 516 is provided to a transformation circuit 521where the number of bits in registered vector 516 are increased to yielda transformed vector 526. The operation of transformation circuit 521 ismore fully discussed below. In one embodiment of the present invention,the number of bits in transformed vector 526 is two-hundred fifty-five(255) bits. Based upon the disclosure provided herein, one of ordinaryskill in the art will recognize other bit widths that may be used inrelation to different embodiments of the present invention. Transformedvector 526 is stored to a register circuit 531 that provides theregistered data as a registered vector 536 (b′).

Assuming the width of registered vector 535 and registered vector 536 is255, parallel cyclic convolution circuit 540 that splits each ofregistered vector 535 and registered vector 536 into chunks (s₀ ⁽¹⁾, . .. , s₀ ⁽¹²⁾, s₁ ⁽¹⁾, . . . , s₁ ⁽¹²⁾, s₂ ⁽¹⁾, . . . , s₂ ⁽¹²⁾), where1-bit chunks s₀ ⁽¹⁾, s₁ ⁽¹⁾, s₂ ⁽¹⁾ are considered as elements of GF(2);4-bit chunks s₀ ⁽²⁾, s₁ ⁽²⁾, s₂ ⁽²⁾ are considered as elements ofGF(2⁴); 8-bit chunks s₀ ⁽³⁾, s₁ ⁽³⁾, s₂ ⁽³⁾, . . . , s₀ ⁽¹²⁾, s₁ ⁽¹²⁾,s₂ ⁽¹²⁾ are considered as elements of GF(2⁸).

The aforementioned chunks are distributed between twelve cyclicconvolution blocks 550, 560, 570, 580 over the finite fields GF(2),GF(2⁴), and GF(2⁸) as shown on FIG. 5 b. The primitive polynomial forGF(2⁴) is x⁴+x+1, the primitive polynomial for GF(2⁸) is x⁸+x⁴+x³+x+1.Turning to FIG. 5 b, the ith cyclic convolution block calculates thecyclic convolution of the chunks a₀ ^((i)), a₁ ^((i)), a₂ ^((i)) ofregistered vector 535 (a′) and the chunks b₀ ^((i)), b₁ ^((i)), b₂^((i)) of registered vector 536 (b′). Each of the twelve cyclicconvolution blocks (represented by blocks 550, 560, 570, 580) calculatescyclic convolution of length three (3) and can be implemented similar tocyclic convolution circuit 900 discussed above in relation to FIG. 4 c.In particular, a 3×1 block a₀ ⁽¹⁾, a₁ ⁽¹⁾, a₂ ⁽¹⁾ is convolved with a3×1 block b₀ ⁽¹⁾, b₁ ⁽¹⁾, b₂ ⁽¹⁾ by block 550 to yield a 3×1 convolvedoutput c₀ ⁽¹⁾, c₁ ⁽¹⁾, c₂ ⁽¹⁾.A 3×4 block a₀ ⁽²⁾, a₁ ⁽²⁾, a₂ ⁽²⁾ isconvolved with a 3×4 block b₀ ⁽²⁾, b₁ ⁽²⁾, b₂ ⁽²⁾ by block 560 to yielda 3×4 convolved output c₀ ⁽²⁾, c₁ ⁽²⁾, c₂ ⁽²⁾. A 3×8 block a₀ ⁽³⁾, a₁⁽³⁾, a₂ ⁽³⁾ is convolved with a 3×8 block b₀ ⁽³⁾, b₁ ⁽³⁾, b₂ ⁽³⁾by block570 to yield a 3x8 convolved output c₀ ⁽³⁾, c₁ ⁽³⁾, c₂ ⁽³⁾. A 3×8 blocka₀ ⁽¹²⁾, a₁ ⁽¹²⁾, a₂ ⁽¹²⁾ is convolved with a 3×8 block b₀ ⁽¹²⁾, b₁⁽¹²⁾, b₂ ⁽¹²⁾ by block 580 to yield a 3×8 convolved output C₀ ⁽¹²⁾, c₁⁽¹²⁾, c₂ ⁽¹²⁾. The 3×8 blocks a₀ ^((4..11)), a₁ ^((4..11)), a₂^((4..11)) and b₀ ^((4..11)), b₁ ^((4..11)), b₂ ^((4..11)) are convolvedby respective blocks (not shown) to yield respective 3×8 convolvedoutputs c₀ ^((4..11)), c₁ ^((4..11)), c₂ ^((4..11)). Parallel cyclicconvolution circuit 540 merges the resulting convolved outputs c₀^((1..12)), c₁ ^((1..12)), a₂ ^((1..12)) to yield a cyclic output 545(c′).

Returning to FIG. 5 a and assuming the width of register vector 535 andregister vector 536 is 255, transformation circuit 520 andtransformation circuit 521 each multiply their respective inputsconsidered as vectors over GF(2) by a binary matrix (T). Cyclic output545 is provided to a register circuit 552 which stores the 255-bitvector as a vector output 555. Vector output 555 is provided to aninverse transformation circuit 562 that reverses the transformationapplied by transformation circuit 520 and transformation circuit 521.Inverse transformation circuit 560 multiplies vector output 555 overGF(2) by a binary matrix (T⁻¹). Such multiplications by transformationcircuit 520, transformation circuit 521, and inverse transformationcircuit 562 may be implemented using XOR gates as is known in the art.

In order to define the matrices T and T⁻¹ the following 3'33 blockmatrix (T_(F)) with 85 bits per column is defined:

$\begin{pmatrix}T_{85} & 0 & 0 \\0 & T_{85} & 0 \\0 & 0 & T_{85}\end{pmatrix},$

where T₈₅ is itself an 85×85 matrix by the following row permutations:for all i=1 to 255 move row number 1+85((i−1)mod3)+(i−1)mod 85 to theplace number i. The transformation matrix T is then obtained from T_(F)by removing the last 127 columns. Using the notation indicating thatT_(F) ⁻¹ is the inverse of T_(F), and r_(i) is the ith row of T_(F) ⁻¹,then the inverse matrix T⁻¹ is obtained as follows:

$T^{- 1} = {\begin{pmatrix}{r_{1} + r_{129}} \\{r_{2} + r_{130}} \\\; \\{r_{127} + r_{255}} \\r_{128}\end{pmatrix}.}$

The aforementioned T₈₅ matrix is obtained by factoring the polynomialx⁸⁵+1 to irreducible factors (i.e., primes) over GF(2):

x ⁸⁵+1=f ⁽¹⁾(x) . . . f ⁽¹²⁾(x),

where

f ⁽¹⁾(x)=x+1,

f ⁽²⁾(x)=x ⁴ +x ³ +x ²+1,

f ⁽³⁾(x)=x ⁸ +x ⁷ +x ⁶ +x ⁴ +x ² +x+1,

f ⁽⁴⁾(x)=x ⁸ +x ⁷ +x ⁵ +x+1,

f ⁽⁵⁾(x)=x ⁸ +x ⁷ +x ³ +x+1,

f ⁽⁶⁾(x)=x ⁸ +x ⁵ +x ⁴ +x ³+1,

f ⁽⁷⁾(x)=x ⁸ +x ⁵ +x ⁴ +x ³ +x ² +x+1,

f ⁽⁸⁾(x)=x ⁸ +x ⁶ +x ⁵ +x ⁴ +x ² +x+1,

f ⁽⁹⁾(x)=x ⁸ +x ⁶ +x ⁵ +x ⁴ +x ³ +x+1,

f ⁽¹⁰⁾(x)=x ⁸ +x ⁷ +x ⁶ +x ⁴ +x ³ +x ²+1,

f ⁽¹¹⁾(x)=x ⁸ +x ⁷ +x ⁵ +x ⁴ +x ³ +x ²+1, and

f ⁽¹²⁾(x)=x ⁸ +x ⁷ +x ⁶ +x ⁵ +x ⁴ +x ³+1.

Let d_(i)=deg f^((i))(x) for i=1 to 12, d_(i)×85 matrix T_(i) such thatits jth column is equal to (c₀, . . . , c_(d) _(i) ⁻¹)^(T), where:

c₀+c₁x+ . . . +c_(d) _(i) ⁻¹x^(d) ^(i) ⁻¹=x^(j−1)mod f^((i))(x); for i=1to 12, j=1 to 85. Each irreducible polynomial f^((i))(x) defines thefinite field F^((i))=GF(2)[x]/(f^((i))(x)) of polynomials over GF(2)modulo f^((i0)(x). The field F⁽¹⁾ is isomorphic to the field GF(2), thefield F⁽²⁾ is isomorphic to the field GF(2⁴) defined by the irreduciblepolynomial x⁴+x+1, the fields F⁽²⁾, . . . , F⁽¹²⁾ are isomorphic to thefield GF(2⁸) defined by the irreducible polynomial x⁸+x⁴+x³+x+1. LetB_(i) be the d_(i)×d_(i) transition matrix from the field F^((i)) to thecorresponding isomorphic field. It means that if a binary column vectora representing an element from the field F^((i)) then the vector B_(i)arepresents the corresponding element in the isomorphic field. Then thematrix T₈₅ can be calculated by the following formula:

$T_{85} = {\begin{pmatrix}{B_{1}T_{1}} \\\vdots \\{B_{12}T_{12}}\end{pmatrix}.}$

The resulting matrix T₈₅ is as follows:1111111111111111111111111111111111111111111111111111111111111111111111111111111111111100011000110001100011000110001100011000110001100011000110001100011000110001100011000100011000110001100011000110001100011000110001100011000110001100011000110001100011000110010100101001010010100101001010010100101001010010100101001010010100101001010010100101011110111101111011110111101111011110111101111011110111101111011110111101111011110111111110100110110010111101001101100101111010011011001011110100110110010111101001101100100100111011011100101001110110111001010011101101110010100111011011100101001110110111001010110000000110100101100000001101001011000000011010010110000000110100101100000001101001110011001110000011100110011100000111001100111000001110011001110000011100110011100000011110001000001000111100010000010001111000100000100011110001000001000111100010000010011011001011110100110110010111101001101100101111010011011001011110100110110010111101000000010111011101000000101110111010000001011101110100000010111011101000000101110111010000010001111000100000100011110001000001000111100010000010001111000100000100011110001110111101101000110000011010010001000111110111010111000100100101011111111010011111100101110101110001001001010111111110100111111001110111101101000110000011010010001000111110100111000011110101001011001011110001110011010101011000101010001011010110110000100000010001011010110110000100000010011100001111010100101100101111000111001101010101100010100110010000110011110010011011011111000000011101000001010011000110111001010000101110110100011000001101001000100011111011101011100010010010101111111101001111110011101111011010101000101101011011000010000001001110000111101010010110010111100011100110101010110000101110110011001000011001111001001101101111100000001110100000101001100011011100101001011101111100010001001011000001100010110111101110011111100101111111101010010010001110011000001100010110111101110011111100101111111101010010010001110101110111110001000100100010011001101110100001010011101100011001010000010111000000011111011011001001111001100000111110110110010011110011000010011001101110100001010011101100011001010000010111000011110000111001000000100001101101011010001010100011010101011001110001111010011010010101110101110111110001000100101100000110001011011110111001111110010111111110101001001000011111011011001001111001100001001100110111010000101001110110001100101000001011100000000011100100000010000110110101101000101010001101010101100111000111101001101001010111110001111111100010100011111111000101000111111110001010001111111100010100011111111000100001111011011110000011110110111100000111101101111000001111011011110000011110110111100000101010100001100001010101000011000010101010000110000101010100001100001010101000011001001100011001011010011000110010110100110001100101101001100011001011010011000110010110001000110110001000010001101100010000100011011000100001000110110001000010001101100010000111111110001010001111111100010100011111111000101000111111110001010001111111100010100100010010111101001000100101111010010001001011110100100010010111101001000100101111010000100011011000100001000110110001000010001101100010000100011011000100001000110110001111100010110011110000010111010000101010011011110110110000100011101110110010111110111101011111011111111000101100111100000101110100001010100110111101101100001000111011101100110010011001101011100001100000001001110101000100001110011100011111101011000110110100000110000000100111010100010000111001110001111110101100011011010001100100110011010111000100100000111101001011011100101011010101011110010001010000001101001111100110001001010001010100110111101101100001000111011101100101111101111111100010110011110000010111010000100001110011100011111101011000110110100011001001100110101110000110000000100111010100000011010011111001100010010100100100000111101001011011100101011010101011110010001011110110110001011100111110000111111011110100000011101011101100111011100010010101010001010100011110110110001011100111110000111111011110100000011101011101100111011100010010101110100101111000110011000010100100100001101000101011111010101101011000000010110010110101001001000011010001010111110101011010110000000101100101101110100101111000110011000001001111001101010011001001101111111100100011011010011100101000010001000001100011100000011111101111010000001110101110110011101110001001010101000111101101100010111001111100010000110100010101111101010110101100000001011001011011101001011110001100110000101001000010001000001100011100000100111100110101001100100110111111110010001101101001110010110111001110101101100101001111011010111101000110111100011100101100010010010111010011010110101111010001101111000111001011000100100101110100110110111001110101101100101001111001100110100001101010101110001011111111001001110000000101101001000000111011101100000100110101010111000101111111100100111000000010110100100000011101110110000010011001101000001000010001111001100011001000101010010101100111111011111010100000110000111110000101001110101101100101001111011010111101000110111100011100101100010010010111010011011011101000000111011101100000100110011010000110101010111000101111111100100111000000010110100010100010000100011110011000110010001010100101011001111110111110101000001100001111100100000010111101111110000111110011101000110110111100010101010010001110111001101110101100010111101111110000111110011101000110110111100010101010010001110111001101110101110000110110001001111111101100100110010101100111100100000111000110000010001000010100111001001010011100101101100010011111111011001001100101011001111001000001110001100000100010000110100000001101011010101111101010001011000010010010100001100110001111010010111011010111010001101101111000101010100100011101110011011101011100000010111101111110000111110001100101011001111001000001110001100000100010000101001110010110110001001111111101100100011001100011110100101110110100110100000001101011010101111101010001011000010010010101000110100111000111101100010111101011011110010100110110101110011101101100101110100100010111001110110110010111010010010001101001110001111011000101111010110111100101001101101001011010000000111001001111111101000111010101011000010110011001000001101110111000000111010101011000010110011001000001101110111000000100101101000000011100100111111110100001010000111110000110000010101111101111110011010100101010001001100011001111000100001001101100101110100100100011010011100011110110001011110101101111001010011011010111001110000001110010011111111010001110101010110000101100110010000011011101110000001001011010000001010111110111111001101010010101000100110001100111100010000100010100001111100001110000110110111101100101010000101110100000111100110100011111111011111010011011101110000101010000101110100000111100110100011111111011111010011011101110001000011011011110110010010100100011001111100101100000010100010011110101010110101001110110100101111000001000101001000110011111001011000000101000100111101010101101010011101101001011110000010010011100111000010001010111001000000011000011101011001100100110001011011000110101111110011001010100001011101000001111001101000111111110111110100110111011100010000110110111100100100101001000110011111001011000000101000100111101010101101010011101101001011110000001101011111100011100111000010001010111001000000011000011101011001100100110001011011

It should be noted that the various blocks discussed in the aboveapplication may be implemented in integrated circuits along with otherfunctionality. Such integrated circuits may include all of the functionsof a given block, system or circuit, or a subset of the block, system orcircuit. Further, elements of the blocks, systems or circuits may beimplemented across multiple integrated circuits. Such integratedcircuits may be any type of integrated circuit known in the artincluding, but are not limited to, a monolithic integrated circuit, aflip chip integrated circuit, a multichip module integrated circuit,and/or a mixed signal integrated circuit. It should also be noted thatvarious functions of the blocks, systems or circuits discussed hereinmay be implemented in either software or firmware. In some such cases,the entire system, block or circuit may be implemented using itssoftware or firmware equivalent. In other cases, the one part of a givensystem, block or circuit may be implemented in software or firmware,while other parts are implemented in hardware.

In conclusion, the invention provides novel systems, devices, methodsand arrangements for data processing. While detailed descriptions of oneor more embodiments of the invention have been given above, variousalternatives, modifications, and equivalents will be apparent to thoseskilled in the art without varying from the spirit of the invention.Therefore, the above description should not be taken as limiting thescope of the invention, which is defined by the appended claims

What is claimed is:
 1. A data processing system, the system comprising:an encoder circuit including: a cyclic convolution circuit operable tomultiply a vector input derived from a user data input by a portion of acirculant matrix to yield a convolved output; and an encoded outputcircuit operable to generate an encoded data set corresponding to theuser data input and based at least in part on the convolved output. 2.The data processing system of claim 1, wherein the encoded outputcircuit comprises: a vector adder circuit operable to sum instances ofthe convolved output with instances of a cyclic convolution output toyield a corresponding instance of a vector sum; and a shift registercircuit operable to shift instances of the vector sum to yield theinstances of the cyclic convolution output.
 3. The data processingsystem of claim 2, wherein the encoded data set generated based at leastin part on the cyclic convolution output.
 4. The data processing systemof claim 2, wherein the number of instances of the vector sum is l, andwherein l corresponds to the number of sub-vectors into which the userdata input is divided.
 5. The data processing system of claim 1, whereinthe cyclic convolution circuit includes: a first cyclic convolutioncircuit; and a second cyclic convolution circuit, wherein the firstcyclic convolution circuit operates in parallel with the second cyclicconvolution circuit, and wherein the first cyclic convolution circuitoperates on a first portion of the vector input and the second cyclicconvolution circuit operates on a second portion of the vector input. 6.The data processing system of claim 5, wherein the first portion of thevector input is a 3×1 portion of the vector input, and wherein thesecond portion of the vector input is a 3×4 portion of the vector input.7. The data processing system of claim 5, wherein the first portion ofthe vector input is a 3×4 portion of the vector input, and wherein thesecond portion of the vector input is a 3×8 portion of the vector input.8. The data processing system of claim 1, wherein the system furthercomprises: a transformation circuit operable to transform a first numberof bits of the user data input into a second number of bits of thevector input.
 9. The data processing system of claim 8, wherein thefirst number of bits is 128, and wherein the second number of bits is255.
 10. The data processing system of claim 8, wherein the cyclicconvolution circuit includes: a first cyclic convolution circuit; asecond cyclic convolution circuit, wherein the first cyclic convolutioncircuit operates in parallel with the second cyclic convolution circuit,and wherein the first cyclic convolution circuit operates on a firstportion of the vector input to yield a first sub-output and the secondcyclic convolution circuit operates on a second portion of the vectorinput to yield a second sub-output; and a combining circuit operable tocombine at least the first sub-output and the second sub-output to yielda non-transformed output.
 11. The data processing system of claim 11,wherein the system further includes: an inverse transformation circuitoperable transform the second number of bits of the non-transformedoutput to the first number of bits of a cyclic convolution output. 12.The data processing system of claim 1, wherein the data processingsystem is implemented as part of a device selected from a groupconsisting of: a storage device, and a communication device.
 13. Thedata processing system of claim 1, wherein the data processing system isimplemented as part of an integrated circuit.
 14. A method for dataencoding, the method comprising: receiving a user data input; using acyclic convolution circuit to multiply a vector input derived from auser data input by a portion of a circulant matrix to yield a convolvedoutput; and generating an encoded data set corresponding to the userdata input and based at least in part on the convolved output.
 15. Themethod of claim 14, the method further comprising: transforming a firstnumber of bits of the user data input into a second number of bits toyield the vector input.
 16. The method of claim 15, wherein the firstnumber of bits is 128, and wherein the second number of bits is
 255. 17.The method of claim 14, wherein the cyclic convolution circuit includes:a first cyclic convolution circuit; and a second cyclic convolutioncircuit, wherein the first cyclic convolution circuit operates inparallel with the second cyclic convolution circuit, and wherein thefirst cyclic convolution circuit operates on a first portion of thevector input and the second cyclic convolution circuit operates on asecond portion of the vector input.
 18. The method of claim 17, whereinthe first portion of the vector input is a 3×1 portion of the vectorinput, and wherein the second portion of the vector input is selectedfrom a group consisting of: a 3×4 portion of the vector input, and a 3×8portion of the vector input.
 19. The method of claim 14, wherein themethod further comprises: adding instances of the convolved output withinstances of a cyclic convolution output to yield a correspondinginstance of a vector sum; and shifting instances of the vector sum toyield the instances of the cyclic convolution output.
 20. A data storagedevice, the device comprising: a storage medium; a head disposed inrelation to the storage medium and operable to write an encoded data setto the storage medium; an encoder circuit including: a cyclicconvolution circuit operable to multiply a vector input derived from auser data input by a portion of a circulant matrix to yield a convolvedoutput; and an encoded output circuit operable to generate the encodeddata set corresponding to the user data input and based at least in parton the convolved output.